<<<<<<< HEAD ======= >>>>>>> ffdc14313ae2f6fd3545f631a6ac5b9e068cd4d7

The Shape of Digits

A Bayesian Topological Data Analytic Approach to Classification of Handwritten Digits

Thomas Reinke

Baylor University

Theophilus A. Bediako

Baylor University

August 13, 2025

Contents

  1. MNIST EDA
  2. Tradiotional ML
  3. Proposed Methodology
  4. TDA + ML
  5. Results/Future Work
  6. References

Exploratory Data Analysis

Distribution of training labels

  • Around 6000 digits in each class
  • No class imbalance

Pixel intensity representation of the training set

Training data in 2D

We adopt t-distributed Stochastic Neighbor Embedding(t-SNE) to represent the data in 2D.(See details here)

<<<<<<< HEAD =======

We adopt t-distributed Stochastic Neighbor Embedding(t-SNE) to represent the data in 2D.(See details here)

>>>>>>> ffdc14313ae2f6fd3545f631a6ac5b9e068cd4d7

Traditional ML

Neural networks

We use feedforward neural network which has the following structure:

  • input layer: consists of neurons that receives the input data each neuron in the input layer represents a feature of the input data
  • hidden layer: one or more hidden layers placed between the input and output layers, responsible for capturing complex patterns
  • output layer: final output of the network. Here, the number of neurons represents the number of digits

NN with regularization

  • Depending on a model, a network can have more weights than the size of training data. This leads to overfitting.
  • We considered two approaches to overfitting
    • Dropout learning: think of RF, randomly removes a fraction of the units in a layer during model fitting
    • Regularization: impose penalties on parameters like lasso, ridge etc

Specific NN models considered

  • NN with dropout regularization
  • NN with ridge regularization
  • NN with lasso regularization

Multinomial logistic regression

Multinomial logistic regression is equivalent to a NN with just input and output layers. There are no hidden layers.

  • Output layer with softmax

To be edited

Let’s train the network for 30 epochs, using a default batch size of 32. This means that images from the training set will be presented to the network in batches of 32 at a time, and for each batch, the SGD algorithm will update the network’s weights by an appropriate amount. Then another batch of 32 images will be presented, and so on, until all 60,000 training images in the dataset have been processed, which constitutes one epoch of training. This entire cycle will be repeated for 30 epochs. As training proceeds, the network’s error (loss) on both the training and testing/validation sets will be shown on the left graph, and the accuracy on each of these sets will be shown on the right graph. The accuracy is simply the fraction of input images that the network classifies correctly. A classification is considered correct if the largest output value on the output layer corresponds to the target classification

Proposed Methodology

TDA Workflow

TDA + ML

Analysis

ML Analysis

Proposed Method Analysis

TDA + ML Analysis

Results/Future Work

ML Results

method accuracy
multinomial 0.9855
dropout nn 0.9965
ridge nn 0.9948
lasso no 0.9948

Propsed Results

ML + TDA Results

References

References

<<<<<<< HEAD